Improving information retrieval through correspondence analysis instead of latent semantic analysis
نویسندگان
چکیده
Abstract The initial dimensions extracted by latent semantic analysis (LSA) of a document-term matrix have been shown to mainly display marginal effects, which are irrelevant for information retrieval. To improve the performance LSA, usually elements raw weighted and weighting exponent singular values can be adjusted. An alternative retrieval technique that ignores effects is correspondence (CA). In this paper, LSA CA empirically compared. Moreover, it explored whether two weightings also CA. results four empirical datasets show always performs better than LSA. Weighting data CA; however, dependent improvement small. Adjusting value often improves extent depends on dataset number dimensions.
منابع مشابه
Latent Semantic Analysis and Fiedler Retrieval∗
Latent semantic analysis (LSA) is a method for information retrieval and processing which is based upon the singular value decomposition. It has a geometric interpretation in which objects (e.g. documents and keywords) are placed in a low-dimensional geometric space. In this paper, we derive an alternative algebraic/geometric method for placing objects in space to facilitate information analysi...
متن کاملImproving Text Segmentation Using Latent Semantic Analysis
Choi, Wiemer-Hastings and Moore (2001) proposed to use Latent Semantic Analysis to extract semantic knowledge from corpora in order to improve the accuracy of a text segmentation algorithm. By comparing the accuracy of the very same algorithm depending on whether or not it takes into account complementary semantic knowledge, they were able to show the benefit derived from such knowledge. In the...
متن کاملAnalysis of a Vector Space Model, Latent Semantic Indexing and Formal Concept Analysis for Information Retrieval
Latent Semantic Indexing (LSI), a variant of classical Vector Space Model (VSM), is an Information Retrieval (IR) model that attempts to capture the latent semantic relationship between the data items. Mathematical lattices, under the framework of Formal Concept Analysis (FCA), represent conceptual hierarchies in data and retrieve the information. However, both LSI and FCA use the data represen...
متن کاملImproving Probabilistic Latent Semantic Analysis with Principal Component Analysis
Probabilistic Latent Semantic Analysis (PLSA) models have been shown to provide a better model for capturing polysemy and synonymy than Latent Semantic Analysis (LSA). However, the parameters of a PLSA model are trained using the Expectation Maximization (EM) algorithm, and as a result, the trained model is dependent on the initialization values so that performance can be highly variable. In th...
متن کاملQuery expansion based on relevance feedback and latent semantic analysis
Web search engines are one of the most popular tools on the Internet which are widely-used by expert and novice users. Constructing an adequate query which represents the best specification of users’ information need to the search engine is an important concern of web users. Query expansion is a way to reduce this concern and increase user satisfaction. In this paper, a new method of query expa...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: Journal of Intelligent Information Systems
سال: 2023
ISSN: ['1573-7675', '0925-9902']
DOI: https://doi.org/10.1007/s10844-023-00815-y